Current Issues
Current Issues in Philology and Pedagogical Linguistics
Aktualnye problemi filologii i pedagogicheskoi lingvistiki
RUSSIAN JOURNAL OF LINGUISTICS
ISSN 2079-6021(Print)
ISBN 2619-029X(Online)

Current Issues


DOI: https://doi.org/10.29025/2079-6021-2024-2-18-29

Search algorithms of verbal identity markers in modern scientific discourse

Authors: Goncharova O.V., Zavrumov Z.A., Khaleeva S.A.



Abstract: The article is devoted to the study of identity verbalization specifics via Data Mining. The research material consists of English texts from Internet scientific repositories and e-libraries devoted to various concepts of youth identity. A methodology based on the use of modern natural language processing and machine learning tools was developed and tested as part of the research. The analysis was carried out using the Natural Language Toolkit library for tokenization and POS-tagging procedures for calculating the frequency of tokens from the «identity» environment. Word Embeddings, pre-trained Word2Vec model and K-means algorithm were used for the subsequent analysis and clustering of words based on their semantic proximity. Gensim library and Scikit-learn library were used to work with the Word2Vec model. As a result, it was proved that in English scientific discourse young person’s identity is verbalized within 9 semantic categories: behavior, communities, communication, education, identity, language, practice, complexity, science, the most common of which are education (33%), language (21%) and communities (18%). N-grams analysis made it possible to identify semantic fields, establish their attributes, and evaluate texts’ similarity, which provided the most accurate vector space search for semantically close n-grams. Optimization made it possible to establish a similarity measure to rank phrases according to the query, as well as assign each n-gram a certain ranking weight. Improvements can be achieved by adding statistical word weighting, such as TF-IDF. The proposed system is capable of searching in a large text array of related phrases with a similar meaning.

Keywords: Data Mining, Python, semantic category, identity verbalization, youth identity, scientific discourse, Internet scientific repositories.

For citation: Goncharova O.V., Zavrumov Z.A., Khaleeva S.A. Search algorithms of verbal identity markers in modern scientific discourse. Current Issues in Philology and Pedagogical Linguistics. 2024, no 2, pp. 18–29. https://doi.org/10.29025/2079-6021-2024-2-18-29 (In Russ.).

Bionote:
*Oksana V. Goncharova1, Zaur A. Zavrumov2, Svetlana A. Khaleeva3
1, 2,3Pyatigorsk State University;
9 Kalinin Ave., Pyatigorsk, Russian Federation, 357532; 
1ORCID ID: 0000-0003-1044-6244; 2ORCID ID: 0000-0001-6351-826X;
3ORCID ID: 0000-0003-1723-3348
1Web of Science Researcher ID: C-4671-2017; 
2Web of Science Researcher ID: S-4539-2018; 
 1Scopus Author ID: 56037850600; 2Scopus Author ID: 57189696996;
3Scopus Author ID: 56028156500;
*e-mail: oxanavgoncharova@gmail.com

Download issue

References:
1. Terkulov VI. Linguistic foundations of subethnic, ethnic and ethnopolitical identity. Current Issues in Philology and Pedagogical Linguistics. 2024, (1):36–46. https://doi.org/10.29025/2079-6021-2024-1-36-46.  (In Russ.).
2. Schwarz KC, Williams JP. Studies on the social construction of identity and authenticity. Routledge Advances in Sociology. Publisher: Routledge, 2020:182. http//doi.org/10.4324/9780429027987-1. 
3. Emelin VA. From neo-luddism to transhumanism: singularity and vertical progress or identity loss? Philosophy of Science and Technology. 2018; 23 (1): 103-115. Available at: https://cyberleninka.ru/article/n/ot-neoluddizma-k-transgumanizmu-singulyarnost-i-vertikalnyy-progre.... Accessed April 20, 2024. (In Russ.).
4. Saprykin ON. Data mining. Samara: Samara University Press, 2020. Available at: http://repo.ssau.ru/bitstream/Uchebnye-izdaniya/Intellektualnyi-analiz-dannyh-ucheb-posobie-Tekst-el.... Accessed April 20, 2024. (In Russ.).
5. Liu M. Towards a ‘synergy’ of text mining and critical discourse analysis: A corpus-assisted discourse study of imagining Hong Kong’s relations to China in Hong Kong political discourse. Digital Scholarship in the Humanities. 2024:19. http//doi.org/10.1093/llc/fqae010. Available at: https://www.researchgate.net/publication/378799889_Towards_a_’synergy’_of_text_mining_and_critical_d.... Accessed April 20, 2024.
6. Ahmed AF, Sherif MA, Moussallem D, et al. Multilingual Verbalization and Summarization for Explainable Link Discovery. Data & Knowledge Engineering, 2021: 101874. https://doi.org/10.1016/j.datak.2021.101874
7. Accuosto P, Saggion H. Mining arguments in scientific abstracts with discourse-level embeddings. Data & Knowledge Engineering, 2020; (129):101840. http//doi.org/10.1016/j.datak.2020.101840. 
8. Mironova MYu. Scientific Discourse: Evolution of Theoretical and Methodological Approaches and Concepts. Discourse, 2023; 9(2): 137-155. https://doi.org/10.32603/2412-8562-2023-9-2-137-155
9. Kozlova NU. Imagery in scientific discourse. RUDN Journal of Philosophy, 20234 (1): 138-152. Available at: https://cyberleninka.ru/article/n/obraznost-v-nauchnom-diskurse. Accessed April 20, 2024. (In Russ.).
10. Obolkina SV. The formation of scientific discourse. Discourse-Pi, 2022; (2): 35-52. Available at: https://cyberleninka.ru/article/n/stanovlenie-nauchnogo-diskursa. Accessed April 20, 2024. (In Russ.).
11. Aksenova TV. Subjective modality in scientific and scientific journalistic discourse. Vestnik of the Mari State University, 2021; 3 (43): 335-341. Available at: https://cyberleninka.ru/article/n/subektivnaya-modalnost-v-nauchnom-i-nauchno-publitsisticheskom-dis.... Accessed April 20, 2024. (In Russ.).
12. Nuzhnova EE, Babaeva TB, Zhukovskaya NV. Argumentation strategy in scientific discourse. PNRPU Linguistics and Pedagogy Bulletin, 2019; (2): 57-64. Available at: https://cyberleninka.ru/article/n/strategiya-argumentatsii-v-nauchnom-diskurse. Accessed April 20, 2024. (In Russ.).
13. Nersesyan GR. Values of the English-Language Popular Science Pedagogical Discourse: Linguistic Mechanisms and Linguapragmatic Patterns. Nauchnyi Dialog [Scientific Dialogue], 2020; (9): 111-127. https://doi.org/10.24224/2227-1295-2020-9-111-127. Available at: https://www.nauka-dialog.ru/jour/article/view/1927. Accessed April 20, 2024. (In Russ.).
14. Breiman L. Random forests. Machine Learning, 2001; 45(1): 5–32. Available at: https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf. Accessed April 20, 2024.
15. Mimno D, Wallach H, Talley E, et al. Optimizing semantic coherence in topic models. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Edinburgh, Scotland, UK, 2011: 262–272. Available at: https://www.researchgate.net/publication/221012637_Optimizing_Semantic_Coherence_in_Topic_Models. Accessed April 20, 2024.


Количество показов: 4

Возврат к списку

ISSN 2079-6021 (Print)
ISBN 2619-029X (Online)